Working Dates, Time Series, and Forecasting

Module 11

Ray J. Hoobler

Libraries

Code
library(tidyverse)
library(lubridate)
library(fpp3) # Forecasting: Principles and Practice aligned with Tidyverse

Resource for Working with Dates

Chapter 17

Resources for Time Series Models

Dates

Dates in R

Current standard for working with dates: ISO8601

Historical approaches to date handling in R (with help from Claude.ai)

  1. Date class (introduced in R 1.0.0)

The original Date class stored dates as number of days since 1970-01-01

Code
date_old <- as.Date("2024-01-01")
print(as.numeric(date_old))  # Days since epoch
[1] 19723
  1. POSIXct format (Pre-ISO 8601 standard)

Stored as seconds since 1970-01-01

Code
old_date_ct <- as.POSIXct("2024-01-01 12:00:00")
print(as.numeric(old_date_ct))  # Seconds since epoch
[1] 1704135600
  1. POSIXlt format (list-based representation)
Code
old_date_lt <- as.POSIXlt("2024-01-01 12:00:00")
str(unclass(old_date_lt))  # Shows internal structure
List of 11
 $ sec   : num 0
 $ min   : int 0
 $ hour  : int 12
 $ mday  : int 1
 $ mon   : int 0
 $ year  : int 124
 $ wday  : int 1
 $ yday  : int 0
 $ isdst : int 0
 $ zone  : chr "MST"
 $ gmtoff: int NA
 - attr(*, "tzone")= chr [1:3] "" "MST" "MDT"
 - attr(*, "balanced")= logi TRUE

Common historical date formats and parsing methods

Code
dates <- c(
    "01/02/03",           # Ambiguous MM/DD/YY
    "01-02-2003",         # DD-MM-YYYY or MM-DD-YYYY
    "January 2, 2003",    # Month name format
    "2003.01.02"          # Period-separated
)

Historical parsing approaches

strptime() - Traditional parsing method

Code
strptime(dates[1], "%m/%d/%y")
[1] "2003-01-02 MST"
Code
strptime(dates[2], "%d-%m-%Y")
[1] "2003-02-01 MST"
Code
strptime(dates[3], "%B %d, %Y")
[1] "2003-01-02 MST"

as.Date() with format specification

Code
as.Date(dates[1], format="%m/%d/%y")
[1] "2003-01-02"
Code
as.Date(dates[2], format="%d-%m-%Y")
[1] "2003-02-01"

Handling different locales (pre-standardization)

Code
Sys.setlocale("LC_TIME", "C")  # POSIX locale
[1] "C"
Code
format(as.Date("2024-01-01"), "%a %b %d %Y")
[1] "Mon Jan 01 2024"
Code
Sys.setlocale("LC_TIME", "en_US.UTF-8")  # US locale
[1] "en_US.UTF-8"
Code
format(as.Date("2024-01-01"), "%a %b %d %Y")
[1] "Mon Jan 01 2024"

Historical challenges with time zones

Pre-ISO 8601, timezone handling was less standardized

Code
ct_ny <- as.POSIXct("2024-01-01 12:00:00", tz="America/New_York")
ct_utc <- as.POSIXct("2024-01-01 12:00:00", tz="UTC")
print(difftime(ct_ny, ct_utc))
Time difference of 5 hours

Demonstrating historical ambiguity issues

Code
ambiguous_date <- "03/04/05"

Could be interpreted multiple ways:

Code
as.Date(ambiguous_date, format="%m/%d/%y")  # US interpretation
[1] "2005-03-04"
Code
as.Date(ambiguous_date, format="%d/%m/%y")  # European interpretation
[1] "2005-04-03"
Code
as.Date(ambiguous_date, format="%y/%m/%d")  # ISO-like interpretation
[1] "2003-04-05"

Modern date handling in R (with help from Claude.ai)

The current ISO 8601 standard in R offers several key advantages:

  1. Unambiguous Format:
    • Uses YYYY-MM-DD as the base format
    • Eliminates confusion between US and European date formats
    • Supports both date and time with timezone information
  1. Key Features:
    • Hierarchical ordering (largest to smallest units)
    • Optional time components with T separator
    • Standardized timezone offsets
    • Support for weeks and ordinal dates
  1. Main Benefits:
    • Machine-readable
    • Sorts correctly as text
    • Internationally recognized
    • Reduces parsing errors
  1. Best Practices:
    • Store dates in UTC
    • Use local timezones only for display
    • Always include timezone information for datetime values
    • Use the lubridate package for easier handling

Reading Dates in the Tidyverse

Here, the date is still being read as a character string.

Code
# A simple data frame with dates 
dates_df <- tibble(
    date = c("2024-01-01", "2024-02-01", "2024-03-01"),
    value = c(1, 2, 3)
)

dates_df

Reading Dates from a CSV File

The read_csv() function from the readr package will automatically parse dates.

Code
dates_df |> 
  write_csv("datasets/dates.csv")

read_csv("datasets/dates.csv")

Coercing Dates in the Tidyverse

We can use the lubridate package to coerce the date column to a date object.

Code
dates_df2 <- dates_df |> 
  mutate(date = as_date(date)) # as_date() is part of the lubridate package 

dates_df2

ggplot2 Recogizes Dates

We can then format them as needed.

Code
dates_df2 |> 
  ggplot(aes(x = date, y = value)) +
  geom_line() +
  geom_point() +
  theme_light()

Code
dates_df2 |> 
  ggplot(aes(x = date, y = value)) +
  geom_line() +
  geom_point() +
  scale_x_date(
    date_labels = ("%b %d %Y"), 
    breaks = dates_df2$date) +
  labs(
    x = NULL
  ) +
  theme_light()

What is a Time Series?

Definition of Time Series

An ordered sequence of values of a variable at equally spaced time intervals.

From fpp3:

Examples of time series data include:

Annual Google profits
Quarterly sales results for Amazon
Monthly rainfall
Weekly retail sales
Daily IBM stock prices
Hourly electricity demand
5-minute freeway traffic counts
Time-stamped stock transaction data

Anything that is observed sequentially over time is a time series.

ts() data structure (1/2)

Univariate Time Series

Code
str(co2)
 Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
Code
co2
        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
1959 315.42 316.31 316.50 317.56 318.13 318.00 316.39 314.65 313.68 313.18
1960 316.27 316.81 317.42 318.87 319.87 319.43 318.01 315.74 314.00 313.68
1961 316.73 317.54 318.38 319.31 320.42 319.61 318.42 316.63 314.83 315.16
1962 317.78 318.40 319.53 320.42 320.85 320.45 319.45 317.25 316.11 315.27
1963 318.58 318.92 319.70 321.22 322.08 321.31 319.58 317.61 316.05 315.83
1964 319.41 320.07 320.74 321.40 322.06 321.73 320.27 318.54 316.54 316.71
1965 319.27 320.28 320.73 321.97 322.00 321.71 321.05 318.71 317.66 317.14
1966 320.46 321.43 322.23 323.54 323.91 323.59 322.24 320.20 318.48 317.94
1967 322.17 322.34 322.88 324.25 324.83 323.93 322.38 320.76 319.10 319.24
1968 322.40 322.99 323.73 324.86 325.40 325.20 323.98 321.95 320.18 320.09
1969 323.83 324.26 325.47 326.50 327.21 326.54 325.72 323.50 322.22 321.62
1970 324.89 325.82 326.77 327.97 327.91 327.50 326.18 324.53 322.93 322.90
1971 326.01 326.51 327.01 327.62 328.76 328.40 327.20 325.27 323.20 323.40
1972 326.60 327.47 327.58 329.56 329.90 328.92 327.88 326.16 324.68 325.04
1973 328.37 329.40 330.14 331.33 332.31 331.90 330.70 329.15 327.35 327.02
1974 329.18 330.55 331.32 332.48 332.92 332.08 331.01 329.23 327.27 327.21
1975 330.23 331.25 331.87 333.14 333.80 333.43 331.73 329.90 328.40 328.17
1976 331.58 332.39 333.33 334.41 334.71 334.17 332.89 330.77 329.14 328.78
1977 332.75 333.24 334.53 335.90 336.57 336.10 334.76 332.59 331.42 330.98
1978 334.80 335.22 336.47 337.59 337.84 337.72 336.37 334.51 332.60 332.38
1979 336.05 336.59 337.79 338.71 339.30 339.12 337.56 335.92 333.75 333.70
1980 337.84 338.19 339.91 340.60 341.29 341.00 339.39 337.43 335.72 335.84
1981 339.06 340.30 341.21 342.33 342.74 342.08 340.32 338.26 336.52 336.68
1982 340.57 341.44 342.53 343.39 343.96 343.18 341.88 339.65 337.81 337.69
1983 341.20 342.35 342.93 344.77 345.58 345.14 343.81 342.21 339.69 339.82
1984 343.52 344.33 345.11 346.88 347.25 346.62 345.22 343.11 340.90 341.18
1985 344.79 345.82 347.25 348.17 348.74 348.07 346.38 344.51 342.92 342.62
1986 346.11 346.78 347.68 349.37 350.03 349.37 347.76 345.73 344.68 343.99
1987 347.84 348.29 349.23 350.80 351.66 351.07 349.33 347.92 346.27 346.18
1988 350.25 351.54 352.05 353.41 354.04 353.62 352.22 350.27 348.55 348.72
1989 352.60 352.92 353.53 355.26 355.52 354.97 353.75 351.52 349.64 349.83
1990 353.50 354.55 355.23 356.04 357.00 356.07 354.67 352.76 350.82 351.04
1991 354.59 355.63 357.03 358.48 359.22 358.12 356.06 353.92 352.05 352.11
1992 355.88 356.63 357.72 359.07 359.58 359.17 356.94 354.92 352.94 353.23
1993 356.63 357.10 358.32 359.41 360.23 359.55 357.53 355.48 353.67 353.95
1994 358.34 358.89 359.95 361.25 361.67 360.94 359.55 357.49 355.84 356.00
1995 359.98 361.03 361.66 363.48 363.82 363.30 361.94 359.50 358.11 357.80
1996 362.09 363.29 364.06 364.76 365.45 365.01 363.70 361.54 359.51 359.65
1997 363.23 364.06 364.61 366.40 366.84 365.68 364.52 362.57 360.24 360.83
        Nov    Dec
1959 314.66 315.43
1960 314.84 316.03
1961 315.94 316.85
1962 316.53 317.53
1963 316.91 318.20
1964 317.53 318.55
1965 318.70 319.25
1966 319.63 320.87
1967 320.56 321.80
1968 321.16 322.74
1969 322.69 323.95
1970 323.85 324.96
1971 324.63 325.85
1972 326.34 327.39
1973 327.99 328.48
1974 328.29 329.41
1975 329.32 330.59
1976 330.14 331.52
1977 332.24 333.68
1978 333.75 334.78
1979 335.12 336.56
1980 336.93 338.04
1981 338.19 339.44
1982 339.09 340.32
1983 340.98 342.82
1984 342.80 344.04
1985 344.06 345.38
1986 345.48 346.72
1987 347.64 348.78
1988 349.91 351.18
1989 351.14 352.37
1990 352.69 354.07
1991 353.64 354.89
1992 354.09 355.33
1993 355.30 356.78
1994 357.59 359.05
1995 359.61 360.74
1996 360.80 362.38
1997 362.49 364.34

ts() data structure (2/2)

Multivariate Time Series

Code
str(DAAG::greatLakes)
 Time-Series [1:92, 1:4] from 1918 to 2009: 174 174 174 174 174 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Erie" "michHuron" "Ontario" "StClair"
Code
DAAG::greatLakes
Time Series:
Start = 1918 
End = 2009 
Frequency = 1 
         Erie michHuron  Ontario  StClair
1918 174.0150  176.8867 74.87250 174.9567
1919 174.1808  176.7450 74.95583 175.0767
1920 173.9083  176.6250 74.56750 174.8150
1921 174.0258  176.4883 74.66583 174.8692
1922 173.9275  176.4450 74.65667 174.8108
1923 173.7575  176.2642 74.44333 174.6525
1924 173.8433  176.1867 74.55750 174.6375
1925 173.5933  175.9192 74.39000 174.4058
1926 173.6408  175.8850 74.40250 174.4017
1927 173.8183  176.1483 74.60667 174.5983
1928 173.9517  176.4433 74.80083 174.8050
1929 174.3117  176.8958 75.08500 175.2642
1930 174.2700  176.6508 75.08083 175.0950
1931 173.7050  176.1183 74.35750 174.5083
1932 173.7358  175.9408 74.50250 174.5658
1933 173.6767  175.8675 74.27583 174.5467
1934 173.3317  175.7667 74.02167 174.2742
1935 173.4108  175.8908 74.00417 174.3608
1936 173.4883  175.9392 74.16417 174.3933
1937 173.8200  175.9225 74.49583 174.6033
1938 173.8408  176.1408 74.52250 174.6892
1939 173.8458  176.2692 74.49250 174.7208
1940 173.7950  176.1417 74.42583 174.6450
1941 173.7000  176.1217 74.40750 174.5817
1942 173.8767  176.3342 74.51667 174.7675
1943 174.2317  176.6267 75.13417 175.1125
1944 174.0917  176.5967 74.84583 174.9817
1945 174.1883  176.5700 75.06000 175.0892
1946 174.1358  176.6017 74.99333 175.0383
1947 174.2350  176.5667 75.16917 175.1042
1948 174.2208  176.5308 75.08083 175.0975
1949 173.9733  176.2108 74.74083 174.8317
1950 174.1350  176.2608 74.89167 174.9300
1951 174.3242  176.7358 75.24667 175.2408
1952 174.5200  177.0850 75.36583 175.4992
1953 174.3308  176.9333 75.01500 175.3233
1954 174.2883  176.8292 74.98167 175.2350
1955 174.3483  176.7225 75.14583 175.2517
1956 174.1200  176.4400 74.92167 174.9392
1957 174.0367  176.2633 74.63333 174.8800
1958 173.8083  176.0675 74.42833 174.6292
1959 173.8500  176.0058 74.47500 174.6567
1960 174.0942  176.4775 74.71250 174.9900
1961 174.1008  176.3767 74.61333 174.9633
1962 173.9075  176.2225 74.64167 174.7700
1963 173.7267  175.9225 74.65500 174.5600
1964 173.6075  175.6825 74.28667 174.3650
1965 173.7125  175.9158 74.39583 174.5333
1966 173.8992  176.1608 74.69000 174.7567
1967 174.0592  176.3008 74.85833 174.9250
1968 174.2042  176.4467 74.79917 175.0608
1969 174.3900  176.6958 74.74917 175.2617
1970 174.2583  176.6783 74.68583 175.1500
1971 174.3117  176.8050 74.72333 175.2692
1972 174.5000  176.8883 74.93500 175.4208
1973 174.7408  177.1233 75.21000 175.6500
1974 174.6892  177.0933 75.06583 175.6133
1975 174.6158  176.9733 74.81000 175.5158
1976 174.5733  176.8992 75.04417 175.4783
1977 174.2975  176.5050 74.75833 175.1992
1978 174.3725  176.5908 74.92083 175.2708
1979 174.3917  176.7942 74.84000 175.3558
1980 174.5217  176.8033 74.82333 175.4075
1981 174.3983  176.6983 74.78583 175.2958
1982 174.4250  176.5983 74.75250 175.2992
1983 174.5208  176.8333 74.82167 175.4333
1984 174.5225  176.8950 74.90167 175.4350
1985 174.7283  177.1267 74.87750 175.7000
1986 174.8983  177.2925 75.14750 175.8517
1987 174.6675  176.9700 74.84917 175.5608
1988 174.2617  176.5642 74.68000 175.1933
1989 174.2258  176.4008 74.76083 175.1092
1990 174.2942  176.3500 74.80417 175.1308
1991 174.3267  176.4692 74.87000 175.1975
1992 174.3567  176.4792 74.83250 175.2200
1993 174.5042  176.6958 75.04250 175.3808
1994 174.3667  176.6783 74.75500 175.2900
1995 174.2842  176.5275 74.73917 175.1833
1996 174.3750  176.6542 74.87333 175.2467
1997 174.7175  176.9842 74.96250 175.6125
1998 174.5508  176.7167 74.91000 175.3975
1999 174.1025  176.2358 74.57250 174.9758
2000 173.9900  175.9783 74.78000 174.7925
2001 173.9142  175.9508 74.69917 174.7617
2002 174.0583  176.1183 74.79000 174.8833
2003 173.9658  175.8917 74.71333 174.7375
2004 174.1158  176.1108 74.84833 174.9167
2005 174.1700  176.0900 74.82167 174.9383
2006 174.1467  176.0158 74.83083 174.8917
2007 174.1392  175.9433 74.75167 174.8483
2008 174.1592  176.0050 74.87500 174.9200
2009 174.2483  176.2583 74.87333 175.0767

Analyzing Time Series Data: Moving Averages

Moving Averages (1/8)

Code
supplier <- read_table(
  "Supplier     Cost    Error   ErrorSquared
1   9   -1  1
2   8   -2  4
3   9   -1  1
4   12  2   4
5   9   -1  1
6   12  2   4
7   11  1   1
8   7   -3  9
9   13  3   9
10  9   -1  1
11  11  1   1
12  10  0   0 ", col_names = TRUE
)

supplier

The mean cost is: 10.
The SSE is: 36.
The MSE is: 3.

Is the mean a good estimate of the cost?

Moving Averages (2/8)

Code
supplier |> 
  ggplot(aes(x = Supplier, y = Cost)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = mean(supplier$Cost), color = "red") +
  theme_light()

Moving Averages (3/8)

Code
pc_ebt <- read_table(
  "Year     Earnings_M  Mean    Error   Squared_Error
1985    46.163  48.676  -2.513  6.313
1986    46.998  48.676  -1.678  2.814
1987    47.816  48.676  -0.860  0.739
1988    48.311  48.676  -0.365  0.133
1989    48.758  48.676  0.082   0.007
1990    49.164  48.676  0.488   0.239
1991    49.548  48.676  0.872   0.761
1992    48.915  48.676  0.239   0.057
1993    50.315  48.676  1.639   2.688
1994    50.768  48.676  2.092   4.378", col_names = TRUE
)

pc_ebt

The mean Earnings ($M) is: 48.68.
The SSE is: 18.129.
The MSE is: 1.81.

Is the mean a good estimate of the Earnings?

Moving Averages (4/8)

Plot of earnings over tiem

Code
pc_ebt |> 
  ggplot(aes(x = Year, y = Earnings_M)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = mean(pc_ebt$Earnings_M), color = "red") +
  theme_light()

Moving Averages (5/8)

Code
quarterly_sales <- read_table(
  "Year     Quarter     Period     Sales
90     1      1     362
90     2      2     385
90     3      3     432
90     4      4     341
91     1      5     382
91     2      6     409
91     3      7     498
91     4      8     387
92     1      9     473
92     2     10     513
92     3     11     582
92     4     12     474
93     1     13     544
93     2     14     582
93     3     15     681
93     4     16     557
94     1     17     628
94     2     18     707
94     3     19     773
94     4     20     592
95     1     21     627
95     2     22     725
95     3     23     854
95     4     24     661", col_names = TRUE
)

quarterly_sales

Moving Averages (6/8)

Code
quarterly_sales |> 
  ggplot(aes(x = Period, y = Sales)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = mean(quarterly_sales$Sales), color = "red") +
  theme_light()

Moving Averages (7/8)

Code
quarterly_sales_moving_avg <- quarterly_sales |> 
  mutate(
    moving_avg_3 = slider::slide_dbl(Sales, mean, .before = 1, .after = 1, .complete = TRUE),
    moving_avg_5 = slider::slide_dbl(Sales, mean, .before = 2, .after = 2, .complete = TRUE),
    moving_avg_7 = slider::slide_dbl(Sales, mean, .before = 3, .after = 3, .complete = TRUE)
  )

quarterly_sales_moving_avg

Moving Averages (8/8)

Code
quarterly_sales_moving_avg |> 
  ggplot(aes(x = Period)) +
  geom_line(aes(y = Sales, color = "Original")) +
  geom_line(aes(y = moving_avg_3, color = "3-Period MA")) +
  geom_line(aes(y = moving_avg_5, color = "5-Period MA")) +
  geom_line(aes(y = moving_avg_7, color = "7-Period MA")) +
  geom_smooth(aes(y = Sales, color = "Trend"), 
              method = "lm", formula = y ~ x,
              se = FALSE, linetype = "dashed") +
  scale_color_manual(values = c(
    "Original" = "black",
    "3-Period MA" = "red",
    "5-Period MA" = "blue",
    "7-Period MA" = "green",
    "Trend" = "purple"
  )) +
  labs(color = "Series") +  # Legend title
  theme_light()

Analyzing Time Series Data: Exponential Smoothing

Exponential Smoothing

  • Single Exponential Smoothing
  • Double Exponential Smoothing
  • Triple Exponential Smoothing

Single Exponential Smoothing

\[ S_t = \alpha \sum_{i=1}^{t-2} (1-\alpha)^{i-1} y_{t-i} + (1-\alpha)^{t-2} y_1 \, , \,\,\,\,\, t \ge 2 \, . \]

\[ \alpha \sum_{i=0}^{t-1} (1-\alpha)^i = \alpha \left[ \frac{1-(1-\alpha)^t}{1-(1-\alpha)} \right] = 1 - (1-\alpha)^t \]
\[ S_t = \alpha y_{t-1} + (1-\alpha)S_{t-1} \,\,\,\,\,\,\, 0 < \alpha \le 1 \,\,\,\,\,\,\, t \ge 3 \]

We can see that the summation term shows that the contribution to the smoothed value becomes less at each consecutive time period.

—6.4.3.1 Single Exponential Smoothing

Single Exponential Smoothing Plot

Code
# Plot of alpha (1 - alpha)^t for different values of alpha at t = 1:5
alpha_values <- c(0.1, 0.3, 0.5, 0.7, 0.9)
t_values <- 1:5

alpha_table <- tibble()

for (alpha in alpha_values) {
  alpha_table <- alpha_table |> 
    bind_rows(
      tibble(
        alpha = alpha,
        t = t_values,
        value = (1 - alpha)^t
      )
    )
}

alpha_table |>
  ggplot(aes(x = t, y = value, color = factor(alpha))) +
  geom_point() +
  geom_line() +
  labs(
    x = "t",
    y = expression((1 - alpha)^t),
    color = "alpha"
  ) +
  scale_y_continuous(breaks = seq(0, 1, 0.1)) +
  theme_light()

Simple Exponential Smoothing

\[ S_t = \alpha x_t + (1 - \alpha) S_{t-1} \quad \text{for} \quad 0 \le \alpha \le 1 \]

where \(S_t\) is the smoothed value at time \(t\), \(x_t\) is the observed value at time \(t\), and \(\alpha\) is the smoothing parameter.

Double Exponential Smoothing (Holt linear)

\[ S_t = \alpha x_t + (1 - \alpha) (S_{t-1} + b_{t-1}) \]

\[ b_t = \beta (S_t - S_{t-1}) + (1 - \beta) b_{t-1} \]

where \(b_t\) is the trend component at time \(t\), and \(\beta\) is the trend smoothing parameter.

Triple Exponential Smoothing (Holt-Winters)

\[ \begin{align} S_t &= \alpha \frac{x_t}{c_t - L} + (1 - \alpha) (S_{t-1} + b_{t-1}) \\ b_t &= \beta (S_t - S_{t-1}) + (1 - \beta) b_{t-1} \\ c_t &= \gamma \frac{x_t}{S_t} + (1 - \gamma) c_{t-L} \end{align} \]

Where \(L\) is the length of the seasonal cycle, \(c_t\) is the seasonal component at time \(t\), and \(\gamma\) is the seasonal smoothing parameter.

Data from NIST (6.4.3.2)

Fit below was using a single exponential smoothing model with alpha = 0.3

Code
sample_data <- read_table(
  "Data     Fit
6.4      NA
5.6     6.4
7.8     6.2
8.8     6.7
11.0    7.3
11.6    8.4
16.7    9.4
15.3    11.6
21.6    12.7
22.4    15.4", col_names = TRUE
)

sample_data |> 
  ggplot() +
  geom_line(aes(x = seq_along(Data), y = Fit), color = "blue", linetype = "dashed") +
  geom_line(aes(x = seq_along(Data), y = Data), color = "black") +
  theme_light()

Reproducing the fit

Function for Single Exponential Smoothing

Code
ses_rjh <- function(x, alpha = 0.3) {
  n <- length(x)
  S_t <- numeric(n)  # Initialize vector with zeros
  
  # Set first value
  S_t[1] <- x[1]
  
  # Calculate smoothed values
  for(t in 2:n) {
    S_t[t] <- alpha * x[t] + (1 - alpha) * S_t[t-1]
  }
  
  return(S_t)
}

Test the function

Code
ses_rjh(sample_data$Data, alpha = 0.3)
 [1]  6.400000  6.160000  6.652000  7.296400  8.407480  9.365236 11.565665
 [8] 12.685966 15.360176 17.472123

Plot the SES fit

Code
sample_data |> 
  mutate(fit2 = ses_rjh(Data, alpha = 0.3))  |> 
  ggplot(aes(x = seq_along(Data))) +
  geom_line(aes(y = Data), color = "black") +
  # geom_line(aes(y = Fit), color = "red") +
  geom_line(aes(y = fit2), color = "blue", linetype = "dashed") +
  theme_light()

Function for Double Exponential Smoothing

Code
des_rjh <- function(x, alpha = 0.3, beta = 0.3) {
  n <- length(x)
  S_t <- numeric(n)  # Initialize vector with zeros
  b_t <- numeric(n)  # Initialize vector with zeros
  
  # Set first value
  S_t[1] <- x[1]
  b_t[1] <- x[2] - x[1]
  
  # Calculate smoothed values
  for(t in 2:n) {
    S_t[t] <- alpha * x[t] + (1 - alpha) * (S_t[t-1] + b_t[t-1])
    b_t[t] <- beta * (S_t[t] - S_t[t-1]) + (1 - beta) * b_t[t-1]
  }
  
  return(S_t + b_t)
}

Test the function

Code
des_rjh(sample_data$Data, alpha = 0.3, beta = 0.3)
 [1]  5.600000  4.800000  5.170000  6.055700  7.780677  9.511900 12.900685
 [8] 15.068773 19.064245 22.401293

Plot the DES fit

Code
sample_data |> 
  mutate(fit2 = des_rjh(Data, alpha = 0.3, beta = 0.3))  |> 
  ggplot(aes(x = seq_along(Data))) +
  geom_line(aes(y = Data), color = "black") +
  # geom_line(aes(y = Fit), color = "red") +
  geom_line(aes(y = fit2), color = "blue") +
  theme_light()

Function for Triple Exponential Smoothing

Code
tes_rjh <- function(x, alpha = 0.3, b = 0.3, gamma = 0.3, m = 1) {
  n <- length(x)
  S_t <- numeric(n)  # Initialize vector with zeros
  b_t <- numeric(n)  # Initialize vector with zeros
  c_t <- numeric(n)  # Initialize vector with zeros
  
  # Set first value
  S_t[1] <- x[1]
  b_t[1] <- x[2] - x[1]
  c_t[1] <- x[3] - x[2]
  
  # Calculate smoothed values
  for(t in 2:n) {
    S_t[t] <- alpha * x[t] + (1 - alpha) * (S_t[t-1] + b_t[t-1])
    b_t[t] <- b * (S_t[t] - S_t[t-1]) + (1 - b) * b_t[t-1]
    c_t[t] <- gamma * (S_t[t] - S_t[t-1]) + (1 - gamma) * c_t[t-1]
  }
  
  return(S_t + b_t + c_t)
}

Test the function

Code
tes_rjh(sample_data$Data, alpha = 0.3, b = 0.3, gamma = 0.3)
 [1]  7.800000  6.100000  6.110000  6.881400  8.742664 10.601536 14.485987
 [8] 16.764129 21.273293 24.858676

Plot the fit TES fit

Code
sample_data |> 
  mutate(fit2 = tes_rjh(Data, alpha = 0.3, b = 0.3, gamma = 0.3))  |> 
  ggplot(aes(x = seq_along(Data))) +
  geom_line(aes(y = Data), color = "black") +
  # geom_line(aes(y = Fit), color = "red") +
  geom_line(aes(y = fit2), color = "blue") +
  theme_light()

Seasonal Data

Code
quarterly_sales <- read_table(
  "Year     Quarter     Period     Sales
90     1      1     362
90     2      2     385
90     3      3     432
90     4      4     341
91     1      5     382
91     2      6     409
91     3      7     498
91     4      8     387
92     1      9     473
92     2     10     513
92     3     11     582
92     4     12     474
93     1     13     544
93     2     14     582
93     3     15     681
93     4     16     557
94     1     17     628
94     2     18     707
94     3     19     773
94     4     20     592
95     1     21     627
95     2     22     725
95     3     23     854
95     4     24     661", col_names = TRUE
)

quarterly_sales

Use the tes_rjh function to “fit” the data “by eye”

Code
quarterly_sales |> 
  mutate(fit2 = tes_rjh(Sales, alpha = 0.7, b = 0.3, gamma = 0.9))  |> 
  ggplot(aes(x = seq_along(Sales))) +
  geom_line(aes(y = Sales), color = "black") +
  geom_line(aes(y = fit2), color = "blue") +
  theme_light()

Forecasting

Convert the quarterly data to a tsibble (1/3)

Code
quarterly_sales_ts_simple <- quarterly_sales |>
  select(Sales) |> 
  ts(start = c(1990, 1), frequency = 4) |> 
  as_tsibble()

quarterly_sales_ts_simple

Convert the quarterly data to a tsibble (2/3)

Code
## Multivariate
z <- ts(matrix(rnorm(300), 100, 3), start = c(1961, 1), frequency = 12)
head(z)
        Series 1     Series 2    Series 3
[1,]  1.41823406  0.301586182 -0.71067009
[2,]  0.48068296 -0.237580713 -0.01001643
[3,]  0.02610005 -0.008244583 -0.15080575
[4,]  1.39640321 -0.029422360  0.20491456
[5,]  1.31129210  1.016495942  0.47531015
[6,] -0.49807914 -0.786520756 -1.42158228

Convert the quarterly data to a tsibble (3/3)

Code
tbl3 <- tibble(
  mth = rep(yearmonth("2010 Jan") + 0:8, each = 3),
  xyz = rep(c("x", "y", "z"), each = 9),
  abc = rep(letters[1:3], times = 9),
  value = rnorm(27)
)
as_tsibble(tbl3, key = c(xyz, abc))

Use the ETS function from fpp3 (1/3)

Code
quarterly_sales

Use the ETS function from fpp3 (2/3)

Code
quarterly_sales_tsibble <- quarterly_sales |>
  mutate(
    # Convert 2-digit year to 4-digit year
    Year = 1900 + Year,
    # Create proper quarter dates by converting to months (Q1=1, Q2=4, Q3=7, Q4=10)
    Month = (Quarter - 1) * 3 + 1,
    # Create the yearquarter date
    YearQuarter = yearquarter(paste(Year, Month, "01", sep = "-"))
  ) |>
  # Select needed columns and convert to tsibble
  select(YearQuarter, Sales) |>
  as_tsibble(
    index = YearQuarter
  )

quarterly_sales_tsibble

An easier way to get the data frame

Code
quarterly_sales_tsibble2 <- quarterly_sales |>
 select(Sales) |> 
  ts(start = c(1990, 1), frequency = 4) |>
  as_tsibble()

quarterly_sales_tsibble2

Use the ETS function from fpp3 (3/3)

Code
fit_ets <- quarterly_sales_tsibble |>
  model(model_ets = ETS(Sales ~ error("A") + trend("A") + season("A")))

autoplot(quarterly_sales_tsibble, Sales) +
  geom_line(aes(y = .fitted), data = augment(fit_ets), color = "blue")

Forecasting with ETS

Code
fit_ets |> 
  forecast(h = 8) |> 
  autoplot(quarterly_sales_tsibble)

Fit and Forecasat with ETS

Code
fit_ets |> 
  forecast(h = 8) |> 
  autoplot(quarterly_sales_tsibble) +
  geom_line(aes(y = .fitted), data = augment(fit_ets), color = "blue")

In Practice

Real World Example: SEMI Forecasting

Global Silicon Wafer Shipments

Units are in “million square inches (MSI)”

Code
msi_historical <- read_tsv("datasets/MSI.txt", col_names = c("year", "Q1", "Q2", "Q3", "Q4"))
msi_historical

Wrangle the Data into a Long Format

year quarter msi
2001 Q1 1250
2001 Q2 988

And convert to a time series object (tsibble)

Code
msi_ts <- msi_historical |> 
  pivot_longer(cols = -year, names_to = "quarter", values_to = "quarter_value", ) |> 
  separate_wider_delim(cols = quarter_value, names = c("quarter_2", "msi"), delim = " ") |> 
  mutate(msi = as.numeric(str_remove(msi, ","))) |> 
  select(-quarter_2) |> 
  mutate(year_quarter = str_c(year, " ", quarter)) |> 
  select(year_quarter, msi) |> 
  mutate(year_quarter = yearquarter(year_quarter)) |> 
  filter(!is.na(msi)) |>
  as_tsibble(index = year_quarter)

msi_ts

Plot the Time Series

Code
msi_ts |> 
  autoplot(msi)

Fit and Forecast with ETS

Code
fit_ets_msi <- msi_ts |> 
  model(model_ets = ETS(msi ~ error("A") + trend("A") + season("A")))

fit_ets_msi |>
  forecast(h = 12) |>
  autoplot(msi_ts) +
  geom_line(aes(y = .fitted), data = augment(fit_ets_msi), color = "blue")

Forecasted Values using ARIMA

Code
fit_arima_msi <- msi_ts |> 
  model(model_arima = ARIMA(msi, stepwise = FALSE, approximation = FALSE))

fit_arima_msi |>
  forecast(h = 12) |>
  autoplot(msi_ts) +
  geom_line(aes(y = .fitted), data = augment(fit_arima_msi), color = "blue")

Forecasted Values by Year

Code
msi_three_year_forecast <- fit_arima_msi |>
  forecast(h = 14)

msi_three_year_forecast_annual <- as_tibble(msi_three_year_forecast) |> 
  mutate(year = year(year_quarter)) |> 
  group_by(year) |> 
  summarise(
    forecasted_msi = sum(.mean))

msi_three_year_forecast_annual
Code
msi_long_year <- msi_historical |> 
  pivot_longer(cols = -year, names_to = "quarter", values_to = "quarter_value", ) |> 
  separate_wider_delim(cols = quarter_value, names = c("quarter_2", "msi"), delim = " ") |> 
  mutate(msi = as.numeric(str_remove(msi, ","))) |> 
  select(-quarter_2) |> 
  group_by(year) |>
  summarise(
    msi = sum(msi, na.rm = TRUE)
  ) |>
  ungroup()

msi_long_year

Join the Data Frames

Code
msi_long_year_join <- msi_long_year |> 
  full_join(msi_three_year_forecast_annual, by = "year") |> 
  pivot_longer(cols = c(msi, forecasted_msi), names_to = "type", values_to = "msi")

msi_long_year_join

SEMI Forecast

Code
semi_forecast <- tibble(
  year = 2024:2027,
  semi_foreast_msi = c(12174, 13328, 14507, 15413)
)

semi_forecast

Plot the ARIMA vs SEMI Forecast

Code
msi_long_year_join |> 
  filter(year >= 2022) |>
  ggplot(aes(x = year)) +
  geom_col(aes(y = msi, fill = type)) +
  geom_point(aes(x = year, y = semi_foreast_msi), data = semi_forecast, color = "black") +
  theme_light()

Homework

Univariate Time Series Carbon Dioxide Data

Monthly \(\textrm{CO}_2\) Concentrations

Code
co2
        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
1959 315.42 316.31 316.50 317.56 318.13 318.00 316.39 314.65 313.68 313.18
1960 316.27 316.81 317.42 318.87 319.87 319.43 318.01 315.74 314.00 313.68
1961 316.73 317.54 318.38 319.31 320.42 319.61 318.42 316.63 314.83 315.16
1962 317.78 318.40 319.53 320.42 320.85 320.45 319.45 317.25 316.11 315.27
1963 318.58 318.92 319.70 321.22 322.08 321.31 319.58 317.61 316.05 315.83
1964 319.41 320.07 320.74 321.40 322.06 321.73 320.27 318.54 316.54 316.71
1965 319.27 320.28 320.73 321.97 322.00 321.71 321.05 318.71 317.66 317.14
1966 320.46 321.43 322.23 323.54 323.91 323.59 322.24 320.20 318.48 317.94
1967 322.17 322.34 322.88 324.25 324.83 323.93 322.38 320.76 319.10 319.24
1968 322.40 322.99 323.73 324.86 325.40 325.20 323.98 321.95 320.18 320.09
1969 323.83 324.26 325.47 326.50 327.21 326.54 325.72 323.50 322.22 321.62
1970 324.89 325.82 326.77 327.97 327.91 327.50 326.18 324.53 322.93 322.90
1971 326.01 326.51 327.01 327.62 328.76 328.40 327.20 325.27 323.20 323.40
1972 326.60 327.47 327.58 329.56 329.90 328.92 327.88 326.16 324.68 325.04
1973 328.37 329.40 330.14 331.33 332.31 331.90 330.70 329.15 327.35 327.02
1974 329.18 330.55 331.32 332.48 332.92 332.08 331.01 329.23 327.27 327.21
1975 330.23 331.25 331.87 333.14 333.80 333.43 331.73 329.90 328.40 328.17
1976 331.58 332.39 333.33 334.41 334.71 334.17 332.89 330.77 329.14 328.78
1977 332.75 333.24 334.53 335.90 336.57 336.10 334.76 332.59 331.42 330.98
1978 334.80 335.22 336.47 337.59 337.84 337.72 336.37 334.51 332.60 332.38
1979 336.05 336.59 337.79 338.71 339.30 339.12 337.56 335.92 333.75 333.70
1980 337.84 338.19 339.91 340.60 341.29 341.00 339.39 337.43 335.72 335.84
1981 339.06 340.30 341.21 342.33 342.74 342.08 340.32 338.26 336.52 336.68
1982 340.57 341.44 342.53 343.39 343.96 343.18 341.88 339.65 337.81 337.69
1983 341.20 342.35 342.93 344.77 345.58 345.14 343.81 342.21 339.69 339.82
1984 343.52 344.33 345.11 346.88 347.25 346.62 345.22 343.11 340.90 341.18
1985 344.79 345.82 347.25 348.17 348.74 348.07 346.38 344.51 342.92 342.62
1986 346.11 346.78 347.68 349.37 350.03 349.37 347.76 345.73 344.68 343.99
1987 347.84 348.29 349.23 350.80 351.66 351.07 349.33 347.92 346.27 346.18
1988 350.25 351.54 352.05 353.41 354.04 353.62 352.22 350.27 348.55 348.72
1989 352.60 352.92 353.53 355.26 355.52 354.97 353.75 351.52 349.64 349.83
1990 353.50 354.55 355.23 356.04 357.00 356.07 354.67 352.76 350.82 351.04
1991 354.59 355.63 357.03 358.48 359.22 358.12 356.06 353.92 352.05 352.11
1992 355.88 356.63 357.72 359.07 359.58 359.17 356.94 354.92 352.94 353.23
1993 356.63 357.10 358.32 359.41 360.23 359.55 357.53 355.48 353.67 353.95
1994 358.34 358.89 359.95 361.25 361.67 360.94 359.55 357.49 355.84 356.00
1995 359.98 361.03 361.66 363.48 363.82 363.30 361.94 359.50 358.11 357.80
1996 362.09 363.29 364.06 364.76 365.45 365.01 363.70 361.54 359.51 359.65
1997 363.23 364.06 364.61 366.40 366.84 365.68 364.52 362.57 360.24 360.83
        Nov    Dec
1959 314.66 315.43
1960 314.84 316.03
1961 315.94 316.85
1962 316.53 317.53
1963 316.91 318.20
1964 317.53 318.55
1965 318.70 319.25
1966 319.63 320.87
1967 320.56 321.80
1968 321.16 322.74
1969 322.69 323.95
1970 323.85 324.96
1971 324.63 325.85
1972 326.34 327.39
1973 327.99 328.48
1974 328.29 329.41
1975 329.32 330.59
1976 330.14 331.52
1977 332.24 333.68
1978 333.75 334.78
1979 335.12 336.56
1980 336.93 338.04
1981 338.19 339.44
1982 339.09 340.32
1983 340.98 342.82
1984 342.80 344.04
1985 344.06 345.38
1986 345.48 346.72
1987 347.64 348.78
1988 349.91 351.18
1989 351.14 352.37
1990 352.69 354.07
1991 353.64 354.89
1992 354.09 355.33
1993 355.30 356.78
1994 357.59 359.05
1995 359.61 360.74
1996 360.80 362.38
1997 362.49 364.34

Fit an ETS Model to the co2 data (1/ )

Convert to a tsibble

Code
co2_tsibble <- co2 |> 
  as_tsibble(index = yearmonth)

co2_tsibble

Fit an ETS Model to the co2 data (2/ )

Fit the ETS model

Code
fit_ets_co2 <- co2_tsibble |> 
  model(model_ets = ETS(value ~ error("A") + trend("A") + season("A")))

Fit an ETS Model to the co2 data (3/ )

Forecast and plot

Code
fit_ets_co2 |> 
  forecast(h = 324) |> 
  autoplot(co2_tsibble) +
  geom_point(x = as_date("2024-10-01"), y = 422, color = "red", size = 4) +
  theme_light()

Fit an ETS Model to the co2 data (4/ )

Plot the data, fit, and forecast

Code
co2_column_names <- c("year", "month", "decimal_date", "monthly_average", "deseasonalized", "n_days", "stdev_days", "uncertainty_mean")

co2_current_data <- read_table("https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt", skip = 52, col_names = co2_column_names)

co2_current_data

Create a ts objectd

Code
co2_current_data_ts <- co2_current_data |> 
  select(monthly_average) |>
  ts(start = c(1959, 1), frequency = 12) |> 
  as_tsibble()

co2_current_data_ts

Fit an ETS Model to the co2 data (5/ )

Code
fit_ets_co2_current <- co2_current_data_ts |> 
  model(model_ets = ETS(value ~ error("A") + trend("A") + season("A")))

fit_ets_co2_current |> 
       forecast(h = 120) |>
  autoplot(filter(co2_current_data_ts, index >= yearmonth("2000-01-1"))) +
  scale_x_yearmonth(date_labels = "%Y") +
  labs(
    x = "Year",
    y = "CO<sub>2</sub> Concentration (ppm)"
  ) +
  theme_light() +
  theme(axis.title.y = ggtext::element_markdown())